careful reading and constructive comment
We thank all the reviewers for their careful readings and constructive comments
We thank all the reviewers for their careful readings and constructive comments. Note that following the RUM interpretation of MNL model (please see response to Rev. #3 for details), the score Re. Applications: As discussed in the Introduction, some motivating applications of our problem lies in various kind of Moreover, as we clarified in Rem. 1 and 2, for the special case of only two-sized subsets (i.e. when k = 2), our regret Bandits to subsetwise feedback ( Multi-Dueling bandits), also use the same notion of regret as ours (see Ref. [11,39]). We sincerely request the reviewers to kindly reconsider their scores based on the above clarifications. Mixed mnl models for discrete response.
We thank the reviewers for their careful reading and constructive comments, and have already updated the manuscript
We thank the reviewers for their careful reading and constructive comments, and have already updated the manuscript. Learning (Meta-IL) we implemented a meta variant of DAgger and tuned it carefully. As shown in Table 1 and Figure 1, SMILe's performance is As such, in the Ant 2D Goal task we trained "fully observed" policies (using SAC) which observe Figure above, fully observable policies were not able to solve this sparse reward task. Generalizing to the 25 testing dynamics is highly non-trivial. Meta-DAgger performs substantially worse than SMILe on this task.